Capstone Project

Pneumonia Detection

Problem Statement: Chest Radiograph is the most commonly used or performed diagnostic imaging Technology. Due to high volume of chest radiography, it could be very time consuming and intensive for the radiologists to review each image manually. As such, an automated solution is ideal to locate the position of inflammation in an image. By having such an automated pneumonia screening system, this can assist physicians to make better clinical decisions.

Business Domain Value: Automating Pneumonia screening in chest radiographs, providing affected area details through bounding box. Assist physicians to make better clinical decisions or even replace human judgement in certain functional areas of healthcare (eg, radiology). Guided by relevant clinical questions, powerful AI techniques can unlock clinically relevant information hidden in the massive amount of data, which in turn can assist clinical decision making.

Details about the data and dataset files are given in below link, https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data

Purpose of this project: Find out a patient with pnuemonia disease. There are some patients, which have some symptoms but not sure they might get affected by this diesea or not. This project also help to find out those patients, who have some symptoms and might get affected by this disease.

Pre-Processing, Data Visualization, EDA

Compare the labels and class information for possible join

Data Inference: A join or merge should typically give us a dataset that has a shape of (30227) assuming we keep all rows and drop the redundant 'patientId' column.

Check uniqueness of the data

Approach:

There could be duplicate patientID entries, resulting to multiple bounding boxes with relative target classification/class information. Compare if the sequence of records are synchronous between the "train labels" and "class information" datasets. If synchronous then a simple join can be performed on the index

Exploring Train Labels Dataset

Inference:

  1. There are 23,286 unique entries and 3,398 entries that have duplicates. To further elaborate, 3,398 patients have multiple indicators ranging from 2 to 4.
  1. In total, the unique entries are 26,684 meaning there are 26,684 patients in the training sample. If the training sample is consistent, we should have the same number in the "class information" dataset and also the same number of images in the training images directory.

Exploring Class Information Dataset

Merging Data

Shape of Dataset

1. Class Distribution

Observation:

The above graph shows total numbers of records of different classes. From the above graph, it shows that patients with No Lung Opacity/ Not Normal are highest as compare to Lung Opacity and Normal patients.

8,851 (29.3%) records does not have any diesaes

9,555 (31.6%) records has Lung Opacity

11,821 (39.1%) records hs No Lung Opacity / Not Normal

2. Target to Class

Observation:

From the above graph, we can understand that 31.6% people got Pneumonia and 68.4% are Non-Pneumonia category.

3. Impact of patient's age on pneumonia

Observation:

4. Relation between patient's age and different classes

Normal Data Distribution

From the above graph, age between 40-60 shows more normal

No Lung Opacity / Not Normal Data Distribution

Observation: Above graph shows that the patient's age between 40-60 have more number of cases.

Lung Opacity Data Distribution

Observation:

From the above graph, clearly visisble that there are more patients between age 40-60 age.

5. Data Distribution on Patient Gender
6. Different classes as per Patient Gender

Observation:

Male patients have more records for different classes.

7. Impact on Age and Gender
8. Age and Pneumonia relation

Observation:

From the above graphs, we can clearly understands that patients age between 50 to 65 has more cases of Pneumonia.

9. View position on different features

Class

Observation:

Patient's with AP view position has more number of records.

Target

Observation:

Patient with AP view position has more number of records then PA.

Gender Wise

Observation:

Patient with AP view position has more records then PA.

Age wise

Observation:

It clearly shows AP view position has more numbers of records then PA.

1. Image Loading - Plotting different X-Rays
2. Heat Maps with Bounding Boxes
3. Drawing the Heat Maps

Observation:

Above heat map depicts high level view of overall Pneumonia cases

Observation:

From the above heat map, we can clearly understand that AP position have more Pneumonia cases that PA position.

Observation:

From the above heat map, we can clearly understands that 2 bounding boxes has more Pneumonia count than others

Observation:

From the above heat maps, we can clearly understand that Male patients has more Pneumonia than Female.

Observation:

From the above heat maps, its clear that Pneumonia is infected age between 40 and 60.

Model Building

Data Processing

Extract Data from DICOM file

Splitting Data into relative classes

UNet
VGG19

Save the Model

Plotting Accuracy and Validation Accuracy

Plotting Loss and Validation Loss

Model testing

AUC Curve

Prediction
1. VGG16
2. Compile Model
3. Train the Model

Saving the Model

Plotting accuracy and validation accuracy

Plotting Loss and Validation Loss

Model Testing

ROC Curve

2. ResNet50

Model Compilation

Model Training

Saving the Model

Plotting & Validation Accuracy

Plotting & Validation Loss

Model Testing

Classification Report and ROC Curve

3. InceptionNet v3

Model Compilation

Model Training

Saving the Model

Plotting & Validation Accuracy

Plotting & Validation Loss

Model testing

Classification report & ROC Curve